System and method for providing step size control for subband affine projection filters for echo cancellation applications

ABSTRACT

A system and method for Acoustic Echo Cancellation. The system and method include a subband affine projection filter and a variable step size controller configured to cancel an estimated echo from a near-end signal. The system and method also include a divergence detector adapted to reset the subband affine projection filter in response to determining a divergence is occurring. Additionally, the system and method include a double talk detector adapted to transmit a signal to mask an output signal when double talk is detected.

CROSS-REFERENCE TO RELATED APPLICATION(S) AND CLAIM OF PRIORITY

The present application is related to U.S. Provisional Patent No.61/002,897, filed Nov. 13, 2007, entitled “SYSTEM AND METHOD FORPROVIDING STEP SIZE CONTROL FOR SUBBAND AFFINE PROJECTION FILTERS FORECHO CANCELLATION APPLICATIONS”. Provisional Patent No. 61/002,897 isassigned to the assignee of the present application and is herebyincorporated by reference into the present application as if fully setforth herein. The present application hereby claims priority under 35U.S.C. §119(e) to U.S. Provisional Patent No. 61/002,897.

TECHNICAL FIELD

This disclosure is generally directed to systems and methods for echocancellation in electronic devices.

BACKGROUND OF THE INVENTION

Echo cancellation is used in telephone communications to describe theprocess of removing echo from a voice communication in order to improvevoice quality on a telephone call.

SUMMARY OF THE INVENTION

A system for Acoustic Echo Cancellation is disclosed. The systemincludes a near-end interface comprising a microphone and a speaker. Thesystem further includes a far-end interface and an acoustic cancellationdevice coupled between the near-end interface and the far-end interface.The acoustic cancellation device is configured to use a subband affineprojection process wherein at least one a of a step size and aregularization factor is determined by a variable step size controller.

A method for Acoustic Echo Cancellation is disclosed. The methodcomprises estimating an echo from a received signal using a subbandaffine projection filter. The echo is estimated by determining at leastone of an optimal step size and an optimal regularization factor. Themethod further includes cancelling the estimated echo from a near-endsignal to produce an AEC output signal

The method further includes detecting a divergence of the subband affineprojection filter and resetting the subband affine projection filter inresponse to determining a divergence is occurring. Additionally, themethod includes detecting a double talk and masking an output signalwhen double talk is detected.

An apparatus for acoustic echo cancellation is disclosed. The apparatusincludes a near-end interface, a far-end interface and an adaptivefilter coupled between the near-end interface and the far-end interface.The apparatus further includes a variable step size controller coupledto the adaptive filter. The variable step size controller is configuredto determine an optimal step size or an optimal regularization factorfor use by the adaptive filter.

Before undertaking the Detailed Description of the Invention below, itmay be advantageous to set forth definitions of certain words andphrases used throughout this patent document: the terms “include” and“comprise,” as well as derivatives thereof, mean inclusion withoutlimitation; the term “or,” is inclusive, meaning and/or; the phrases“associated with” and “associated therewith,” as well as derivativesthereof, may mean to include, be included within, interconnect with,contain, be contained within, connect to or with, couple to or with, becommunicable with, cooperate with, interleave, juxtapose, be proximateto, be bound to or with, have, have a property of, or the like; and theterm “controller” means any device, system or part thereof that controlsat least one operation, such a device may be implemented in hardware,firmware or software, or some combination of at least two of the same.It should be noted that the functionality associated with any particularcontroller may be centralized or distributed, whether locally orremotely. Definitions for certain words and phrases are providedthroughout this patent document, those of ordinary skill in the artshould understand that in many, if not most instances, such definitionsapply to prior, as well as future uses of such defined words andphrases.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present disclosure and itsadvantages, reference is now made to the following description taken inconjunction with the accompanying drawings, in which like referencenumerals represent like parts:

FIG. 1 a illustrates an exemplary acoustic echo canceller according toone embodiment of the present disclosure;

FIG. 1 b illustrates a simple block diagram of an acoustic echocancellation system according to one embodiment of the presentdisclosure;

FIG. 2 illustrates an exemplary circuit for a Subband Affine Projectionprocess according to one embodiment of the present disclosure;

FIG. 3 illustrates a simple block diagram for updating taps using aSubband Affine Projection (“SAP”) process according to one embodiment ofthe present disclosure;

FIG. 4 illustrates a simple block diagram for echo cancellation usingmethods of delay coefficients according to embodiments of the presentdisclosure;

FIG. 5 illustrates a simple block diagram for echo cancellation using amethod of non-parametric variable step size according to one embodimentof the present disclosure;

FIG. 6 illustrates a simple block diagram for echo cancellation using amethod of variable regularization factor according to one embodiment ofthe present disclosure;

FIG. 7 illustrates a simple block diagram illustrating the operation ofa Divergence Detector according to one embodiment of the presentdisclosure;

FIG. 8 illustrates a simple block diagram illustrating the operation ofa Double Talk Detector (DTD) 156 according to one embodiment of thepresent disclosure;

FIG. 9 illustrates a simple output graph utilizing processes for echocancellations;

FIG. 10 illustrates a simple output graph utilizing echo cancellationprocesses according to embodiments of the present disclosure; and

FIG. 11 illustrates a simple output graph utilizing mean square taperror according to embodiments of the present disclosure.

While this disclosure has described certain embodiments and generallyassociated methods, alterations and permutations of these embodimentsand methods will be apparent to those skilled in the art. Accordingly,the descriptions of example embodiments do not define or constrain thisdisclosure. Other changes, substitutions, and alterations are alsopossible without departing from the spirit and scope of this disclosure,as defined by the following claims.

DETAILED DESCRIPTION

FIGS. 1 a through 11, discussed below, and the various embodiments usedto describe the principles of the present disclosure in this patentdocument are by way of illustration only and should not be construed inany way to limit the scope of the disclosure. Those skilled in the artwill understand that the principles of the present disclosure may beimplemented in any suitably arranged wireless or wireline communicationnetwork.

In addition to improving subjective quality, the echo cancellationprocess increases the capacity achieved through silence suppression bypreventing echo from traveling across a network.

Echo can be described in two forms: acoustic echo and hybrid echo.Speech compression techniques and digital processing delay often makethese echoes more severe in telephone networks.

Echo cancellation involves first recognizing the originally transmittedsignal that re-appears, with some delay, in the transmitted or receivedsignal. Once the echo is recognized, it can be removed by “subtracting”it from the transmitted or received signal. This technique is generallyimplemented using a digital signal processor (hereinafter “DSP”), butcan also be implemented in software. Echo cancellation is done usingeither echo suppressors or echo cancellers, or in some cases both.

Acoustic echo cancellation: Acoustic echo arises when sound from aloudspeaker—for example, the earpiece of a telephone handset, is pickedup by the microphone in the same room—for example, the microphone in thevery same handset. The problem exists in any communications scenariowhere there is a speaker and a microphone. Examples of acoustic echo arefound in everyday surroundings such as:

-   -   Hands-free car phone systems;    -   A standard telephone in speakerphone or hands-free mode;    -   Conference phones;    -   Installed room systems which use ceiling speakers and        microphones on the table; and    -   Physical coupling (e.g., vibrations of the loudspeaker transfer        to the microphone via the handset casing).

Direct acoustic path echo occurs when direct sound from the loudspeaker(not the person at the far end—otherwise referred to as the “Talker”)enters the microphone almost unaltered. Direct acoustic path echo occursin most echo cases. The difficulties in cancelling acoustic echo stemfrom the alteration of the original sound by the ambient space. Thisalteration colours the sound that re-enters the microphone. Thesechanges can include certain frequencies being absorbed by softfurnishings, and reflection of different frequencies at varyingstrength. These secondary reflections are not strictly referred to asecho, but rather are reverberation).

Acoustic echo is heard by the far end talkers in a conversation.Therefore, if a person in Room A talks, they will hear their voicebounce around in Room B. If this sound is not cancelled, it will getsent back to its origin. Due to the slight round-trip transmissiondelay, this acoustic echo can be very distracting.

Hybrid or Network echo is generated by the public switched telephonenetwork (hereinafter “PSTN”) through the reflection of electrical energyby a device called a hybrid. Most telephone local loops are two-wirecircuits while transmission facilities are four-wire circuits. Eachhybrid produces echoes in both directions, though the far end echo isusually a greater problem for voiceband.

This application relates to echo cancellation methods. Echo cancellationis very important in telephony networks such as Voice Over InternetProtocol (VOIP). Adaptive filters are used to provide echo cancellation.For a long tail length of echoes it is generally advisable to use asubband affine projection process. Adaptive filters can diverge. Thismeans that the signal output of the echo cancellation unit will increasewith time. This situation is very undesirable. A double talk detector isusually employed to detect the presence of double talk. Double talkoccurs when near-end speech activity exists. When double talk isdetected, the adaptation of the filter is frozen. Alternatively, a stepsize control logic can be employed that constantly determines andupdates the optimum step size of the adaptive filter. The presentinvention provides step size control logic processes for subband affineprojection adaptive filters that are very robust with respect to doubletalk and changing echo cancellation scenarios.

FIG. 1 a illustrates an exemplary communication device 100 with AcousticEcho Cancellation (hereinafter “AEC”). The communication device 100includes communication interfaces 102, 104; a speaker 106; a microphone108; and an adaptive filter 110.

The communication device 100 receives a far end signal (“u”) 112 from afar end device (not illustrated) via communication input interface 102.The speaker 106 broadcasts (e.g. plays-out) the far end signal “u” 112.The far end signal “u” 112 projects throughout a room (not specificallyillustrated) in which the communication device 100 is contained. The farend signal “u” 112 reflects off surfaces in the room. The reflectionsinclude a return of the full far end signal “u” 112 from certainsurfaces and a return of a partial far end signal “u” 112 from othersurfaces. Some surfaces may fully absorb the far end signal “u” 112,thus reflecting none of the far end signal “u” 112. The reflections ofthe far end signal “u” 112 form an echo “f” 116.

The microphone 108 senses the echo “f” 116, any voice inputs from auser, and any background noise. The echo “f” 116, voice inputs andbackground noise are sensed by the microphone 108 as a near end signal“d” 114 (the near end signal “d” 114 is also called double talk).

The adaptive filter 110 has a first interface 118, a second interface120 and a third interface 122. The adaptive filter receives, via thefirst interface 118, the far end signal “u” 112. The adaptive filter 110also receives, via the third interface 122, an error signal “e” 124. Theadaptive filter 110 estimates and generates an estimated echo “f′”126.The adaptive filter 110 outputs estimated echo “f′”126 via secondinterface 120 to cancel the echo “f” 116 in near end signal “d” 114. Thefollowing equations represent an exemplary operation at a node 128.

d=f +v (e.g. local speech (or any local signal) +noise)  EQN (1)

e=d−f′  EQN (2)

e=f−f′+v

The error signal “e” 124 is produced by canceling the estimated echo“f′”126 from the near end signal “d” 114. The acoustic echo cancellationdevice 100 transmits the error signal “e” 124 along communication outputinterface 104. As such, if the adaptive filter 110 is able to accuratelygenerate estimated echo “f′”126 substantially equal to echo “f” 116, theerror signal approaches a signal that substantially represents onlyvoice and background noise. (e.g. if f′=f, then e=d).

Most speech is sampled at eight (8) kilohertz (“KHz”). Sampling at 8 KHzmeans that one-hundred-sixty (160) samples are taken every twenty (20)milliseconds (“ms”) Each 20 ms segment is referred to as a block.Samples are denoted by x(n), where n is a time instant. For example, thefirst sample in the first block is denoted as x(1) and the second samplein the first block is denoted as x(2). The first sample in the secondblock, e.g., the one-hundred-sixty-first sample, is denoted as x(161)because it occurs at a time instant n=161.

At any time instant n, a matrix of the present and previous samples canbe constructed. For example:

${X(n)} = \begin{bmatrix}{x(n)} & {x\left( {n - 1} \right)} \\{x\left( {n - 1} \right)} & {x\left( {n - 2} \right)} \\{x\left( {n - 2} \right)} & {x\left( {n - 3} \right)}\end{bmatrix}$

At n=512, the matrix would appear as:

${X(512)} = \begin{bmatrix}{x(512)} & {x(511)} \\{x(511)} & {x(510)} \\{x(510)} & {x(509)}\end{bmatrix}_{3 \times 2}$

The dimension of the matrix is indicated at the bottom right handcorner. A 3×2 matrix is illustrated. However, it should be understoodthat may other sized matrices can be constructed. Vectors (row andcolumn matrices) are illustrated in lower case while matrices areillustrated in uppercase. An identity matrix of dimension k×k is denotedas I_(k). Additionally, zeros (1, N) is a row matrix of zeros. Someexamples of matrices and matrix concatenation are:

${{zeros}\left( {1,3} \right)} = \begin{bmatrix}0 & 0 & 0\end{bmatrix}$ ${{zeros}\left( {3,1} \right)} = \begin{bmatrix}0 \\0 \\0\end{bmatrix}$${{zeros}\left( {2,2} \right)} = {0_{2 \times 2} = \begin{bmatrix}0 & 0 \\0 & 0\end{bmatrix}}$ ${{ones}\left( {1,3} \right)} = {{\begin{bmatrix}1 & 1 & 1\end{bmatrix}\begin{bmatrix}{{zeros}\left( {1,3} \right)} & {{ones}\left( {1,3} \right)}\end{bmatrix}} = {{\begin{bmatrix}0 & 0 & 0 & 1 & 1 & 1\end{bmatrix}\left\lbrack {{{zeros}\left( {1,3} \right)};{{ones}\left( {1,3} \right)}} \right\rbrack} = {\begin{bmatrix}{{zeros}\left( {1,3} \right)} \\{{ones}\left( {1,3} \right)}\end{bmatrix} = \begin{bmatrix}0 & 0 & 0 \\1 & 1 & 1\end{bmatrix}}}}$ $X = \begin{bmatrix}A & B & C \\D & E & F \\G & H & I\end{bmatrix}$ $Y = \begin{bmatrix}a & b & c \\d & e & f \\g & h & i\end{bmatrix}$ ${X \otimes Y} = \begin{bmatrix}{Aa} & {bB} & {cC} \\{dD} & {eE} & {fF} \\{gG} & {hH} & {iI}\end{bmatrix}$

The sum of all elements of a matrix is given by

SUM(.). Some additional matrix notation examples are:

$A = \begin{bmatrix}1 & 2 \\7 & 4\end{bmatrix}$ SUM(A) = 14 A_(1, 2) = 2 A_(2, 2) = 4$b = \begin{bmatrix}4 \\6 \\9\end{bmatrix}$ b₂ = 6

Additionally, the i^(th) element of vector “b” is denoted as b_(i),while the (i^(th), j^(th)) element of matrix “A” is denoted by A_(ij).If the vector changes with time instant n, the vector is denoted as s(n)and the i^(th) element is denoted as s_(i)(n). Further, a trace of amatrix, TR (A), is the sum of the main diagonal elements. For example,TR(A)=1+4=5

An echo tail length is L samples (e.g. the adaptive filter 110 has Ltaps). A number (“b”) of samples are set to zero (0) such that the firstb taps are zero (0). The first b taps are set to zero by delaying thespeaker 106 feed by b samples. A projection order in the full band isdenoted by P. The number of subbands is M. A projection order in thesubband is P_(s)=^(P)/_(M). The subband adaptive filters haveL_(s)=^(L)/_(M) taps. A product of P_(s) and the background noise at thenear-end, in the absence of Double Talk (hereinafter “DT”) is denoted byσ².

Referring now to FIG. 1 b, Acoustic Echo Cancellation 150 components ofcommunications device 100 according to embodiments of the presentdisclosure are illustrated. The AEC 150 includes an Adaptive Filter 152(hereinafter “AF”). The AF 152 estimates the echo and cancels it fromthe near-end signal. In one embodiment, the AF 152 is a Subband affineprojection AF. The AEC 150 also includes a Variable Step Size controller154 (hereinafter “VSS”). The VSS 154 controls the step size in the AEC150. The VSS 154 is initially high. However, over a time period, the VSS154 takes low values. Additionally, the VSS 154 has very low valuesduring silence periods of the echo or during the presence of DT. Inadditional embodiments, a different manifestation of the step size,called a regularization factor, is utilized.

The VSS 154 includes a processor or special purpose controller adaptedto perform a series of functions necessary to control the step size inthe AEC 150. The VSS 154 also includes a storage means configured tostore a plurality of instructions configured to cause the processor toperform the series of functions. The storage means can be any computerreadable medium, for example, the storage means can be any electronic,magnetic, electromagnetic, optical, electrooptical, electromechanical,and/or other physical device that can contain, store, communicate,propagate, or transmit a computer program, software, firmware, or datafor use by the microprocessor or other computer-related system ormethod.

The AEC 150 also includes a Double Talk Detector 156 (hereinafter“DTD”). The DTD 156 is configured to output a “1” if the DTD 156 detectsdouble talk. The DTD 156 outputs a “0” if no DT is detected. The outputsof the DTD 156 are passed through a MUX 158 to mask the output 160 ofthe AEC 150. If the DTD 156 detects DT, the AEC output 160 is notmasked. However, if no DT is detected by the DTD 156, the AEC output 160is masked by the MUX 158 receiving the “0” signal from the DTD 156. TheAEC output 160 is masked completely because it may have a residual echo.As a result, a comfort noise 162 is sent to the far end.

Further, the AEC 150 includes a Divergence Detector 164. The DivergenceDetector 164 is configured to detect if the AF 152 is diverging. If theAF 152 is diverging, the Divergence Detector 164 is further configuredto reset the AF 152.

Embodiments of the present disclosure comprise step size control logicprocesses to control the step size within the VSS 154. Some step sizecontrol logic processes are: Method of delay coefficients; Nonparametricvariable step size control logic for fullband NLMS; and Variableregularization step size control. The processes can be implemented usingthe processor or special purpose controller.

For the Method of delay coefficients, a variety of step size controllogic for full band NLMS processes are given. In particular the methodof “delay coefficients” makes use of the tap weight error of the firstfew taps to compute the optimum step size. In this method, an artificialdelay is introduced into the system by delaying the input of far endsignal to the loudspeaker by a few samples.

In Variable regularization step size control, step size is one way ofcalculating the adaptation update of the adaptive filter. Normally thereis a division/matrix inversion to be computed in the tap update of theadaptive filter. To prevent division by zero or inversion of a singularmatrix, a small regularization factor is added. The tap updatecorresponds to a constant regularization factor and continuous update ofthe step size. An alternative would be to fix the step size andcontinuously update the regularization factor. Both forms areequivalent.

Generally, it is preferred that the adaptive filter converges quickly.The RLS filter has a quicker convergence than the LMS filter. Howeverthe complexity of the RLS filter is higher than that of the LMS filter.In some embodiments, a class of adaptive filters, referred to as Affineprojection filters, are used. They have RLS-like convergence withLMS-like complexity.

When the tail length of the echo is high, it requires. huge filters withmany taps and this increases the complexity. Subband adaptive filtersare used to improve convergence speed and reduce the complexity.

Referring now to FIG. 2, an exemplary circuit for a Subband AffineProjection process 200 according to one embodiment of the presentdisclosure is illustrated. A far end signal u(n) 112 is input to a setof analysis filters h₀, h₁, . . . , h_(M−1) 202. The analysis filters202 yield M subband signals u₀(n) 212, . . . , u_(M−1)(n) 214. Eachsubband signal 212-214 is split into M parallel signals 212, 216, 218.Each parallel signal 212, 216, 218 is a delayed version of a precedingsignal. For example, the first subband signal u₀(n) 212 is split into Mparallel signals, u₀(n)212 and u₀(n)z⁻¹ 216 to u₀(n)z^(−M+1) 218. Insuch example, z⁻¹ 216 is a delay of u₀(n) 212 and z^(−M+1) 218 is a M−1delay of u₀(n) 212. The parallel signals 212, 216, 218 are input into Mdecimators 220 and M adaptive filters 222 s₀(n), s₁(n) . . . s_(M−1)(n).The output of the adaptive filters 222 are summed 224 to yield y₀(n).

The near end signal d(n) 116 is input into a set of analysis filters h₀,. . . , h_(M−1) 232. The outputs of the analysis filters 232 are passedthrough M decimators 234 to yield M subband signals d₀(n) 236, . . . ,d_(M−1)(n) 238. Thereafter, M subband error signals e₀(n) 254, . . . ,e_(M−1)(n) 256 are computed using d₀(n) 236, . . . , d_(M−1)(n) 238 andy₀(n) 224, . . . , y_(M−1)(n) 226 respectively.

It may be advantageous to set forth the calculations utilized by theprocesses in embodiments of the present disclosure. All calculations areperformed at time instant n.

A mean-square tap error (hereinafter also “MSTE” and “MSTE(n)”) D²(n) iscalculated using Equation (3) below.

$\begin{matrix}{{D^{2}(n)} = {\sum\limits_{i = 0}^{M - 1}{\sum\limits_{j = 0}^{\frac{b}{M} - 1}{s_{{{({i - 1})}*L_{s}} + 1 + j}^{2}(n)}}}} & {{EQN}\mspace{14mu} (3)}\end{matrix}$

U_(ij)(n) is a subband polyphase matrix (SPM_(ij)). There are M² suchmatrices (i,j=0 to M−1). U_(ij)(n) is calculated using Equation (4)below.

$\begin{matrix}{{U_{ij}(n)} = \begin{bmatrix}{u_{ij}(n)} & {u_{ij}\left( {n - 1} \right)} & \ldots & {u_{ij}\left( {n - P_{s} + 1} \right)} \\{u_{ij}\left( {n - 1} \right)} & {u_{ij}\left( {n - 2} \right)} & \ldots & \; \\\vdots & \vdots & \ddots & \ldots \\{u_{ij}\left( {n - L_{s} + 1} \right)} & {u_{ij}\left( {n - L_{s}} \right)} & \ldots & {u_{ij}\left( {n - L_{s} + P_{s} + 2} \right)}\end{bmatrix}_{L_{s} \times P_{s}}} & {{EQN}\mspace{14mu} (4)}\end{matrix}$

X^(T)(n)X(n) is a far end correlation matrix (FECM(n)). X^(T)(n)X(n) iscalculated from a far end matrix (FEM(n)) using Equation (5) below.

$\begin{matrix}{{X(n)} = \begin{bmatrix}{U_{00}(n)} & {U_{10}(n)} & \ldots & {U_{{({M - 1})}0}(n)} \\{U_{01}(n)} & {U_{11}(n)} & \ldots & \; \\\vdots & \vdots & \ddots & \ldots \\{U_{0{({M - 1})}}(n)} & {U_{1{({M - 1})}}(n)} & \ldots & {U_{{({M - 1})}{({M - 1})}}(n)}\end{bmatrix}_{{ML}_{s} \times {MP}_{s}}} & {{EQN}\mspace{14mu} (5)} \\{\mspace{79mu} {{{FECM}(n)} = {{X^{T}(n)}{X(n)}}}} & {{EQN}\mspace{14mu} (6)}\end{matrix}$

U_(ij) ^(T)(n)U_(ij)(n) is a polyphase subband correlation matrix(PSCM_(ij)). There are M² such matrices (i,j=0 to M−1).

An inverse of the far end correlation matrix (IFECM(n))is Π⁻¹(n). Π⁻¹(n)is calculated in Equation (7) below.

Π⁻¹(n)=[x ^(T)(n)x(n)+δII _(MP) _(s) ]⁻¹  EQN(7)

An error correlation matrix (ECM(n)) is represented by e^(T)(n)e(n).

R_(E)(n) is an expectation of error correlation matrix (EECM(n)).R_(E)(n) is calculated by Equation (8) below.

R _(E)(n)=α R _(E)(n−1)+(1−α)e(n)e ^(T)(n)  EQN(8)

An error power estimate (also referenced as Expectation of Error power,EEP(n), or “ecorre”) is calculated using Equation (9) below.

eccore=α·ecorre+(1−α)e ^(T)(n)e(n)  EQN(9)

Background noise power (BNP(n)) is {circumflex over (σ)}². BNP(n) isestimated using Equation (10) or Equation (11) below.

$\begin{matrix}{\mspace{79mu} {{{\hat{\sigma}}^{2} = {\max \left( {\sigma^{2},{{ecorre} - {{{trace}\left( {{X^{T}(n)}{X(n)}} \right)}{D^{2}(n)}}}} \right)}}\mspace{79mu} {or}}} & {{EQN}\mspace{14mu} (10)} \\{{{BNP}(n)} = {\max \left( {\sigma^{2},{{{EEP}(n)} - {\frac{1}{L}{{TR}\left( {{FECM}(n)} \right)}{{MSTE}(n)}}}} \right)}} & {{EQN}\mspace{14mu} (11)}\end{matrix}$

The near end signal power estimate (NESPE(n)) estimates the power of thenear-end signal. The NESPE(n) is calculated by Equation (12) below:

NESPE(n)=0.99*NESPE(n−1)+0.01*d ²(n)  EQN(12)

An estimated echo signal power (EESP(n)) is computed by Equation (13)below:

EESP(n)=0.99*EESP(n−1)+0.01*{s ^(T)(n)X(n)X ^(T)(n)s(n)}  EQN(13)

The near end matrix (nem(n)), error matrix (em(n)) and tap matrix(tm(n)) are calculated by Equations (14a), (14b), and(14c) respectively.

d(n)=[d ₀(n) . . . d ₀(n−P _(s+1)) . . . . . . . . . d _(M−1)(n) . . . d_(M−1)(n−P _(s)+1)]^(T)  EQN(14a)

s(n)=[s ₀ ⁷(n) . . . s ₀ ^(T)(n)]^(T)  (EQN(14b)

e(n)=d(n)−X ^(T)(n)s(n)  EQN(14c)

In the above equations, i and j each vary from 1 to M. The M errorsignals e₀(n) 254, . . . , e_(M−1)(n) 256 and the M² signals u_(ij) areused to update the taps of the M adaptive subband filters 222 at timeinstant (n+1).

Referring now to FIG. 3, the steps for updating taps using a SubbandAffine Projection (“SAP”) Process are illustrated. The process starts atstep 300. A process initialization occurs at time instant n=0. Duringprocess initialization, δ_(init) is set to zero (0).

In step 305, the subband polyphase matrices are computed. The SPM_(ij)are computed for i=0 to i=M−1 and for j=0 to j=M−1. For example, theSPM_(ij) are computed for SPM₀₀, SPM₁₀, SPM₂₀, . . . , SPM_(−1,0),SPM₀₁, SPM₁₁, . . . , SPM_(M−1,M−1). Additionally, u_(ij)(n) can becomputed recursively from u_(ij)(n−1) by deleting the last column,shifting the first p_(s)−1 columns to the right and adding a new columnas the first column.

The process moves to step 310 where FEM(n) is computed. After FEM(n) iscomputed, nem(n), tm(n) and em(n) are computed in step 315. The processuses Equations (14a), (14b) and (14c) above to compute nem(n), tm(n) andem(n).

The step size μ is set to one (1) in step 320 (e.g. μ=1). In step 325,the regularization factor δ is set to an initial value. The regulationfactor is set such that δ=δ_(init)

Thereafter, the processor or special purpose controller computes theinverse of far end correlation matrix (IFECM(n)) in step 330. To savecomputations, (IFECM(n)) can be computed recursively using IFECM(n−1)using known matrix inversion techniques.

The taps are updated in step 335. The taps are updated using Equations(15) and (16) below.

tm(n+1)=tm(n)+μFEM(n)·IFECM(n)·em(n)  EQN(15)

s(n+1)=s(n)+μ×X(n)×Π⁻¹(n)×e(n)  EQN(16)

The process then increments to the next time instant (e.g. timeinstant=n+1) in step 340. Thereafter the process proceeds to step 305for the time instant=n+1.

In the SAP process of FIG. 3, the step size μ and the regularizationfactor δ are fixed in steps 320 and 325. Embodiments of the presentdisclosure utilize processes wherein the step size μ and theregularization factor δ are not fixed. A step size control logic (e.g.in the VSS 154) can determine an optimum μ in step 320 while keeping aconstant δ in step 325. Additionally and alternatively, the VSS 154 candetermine an optimum δ in step 325 while keeping a constant μ in step320.

In another embodiment, illustrated in FIG. 4, a VSS process utilizing amethod of Delay coefficients is utilized for echo cancellation. Theprocess of determining μ, for insertion at step 320, starts at step 400.A process initialization occurs at time instant n=0. During processinitialization, δ_(init) is set to twenty (20) times the sum of subbandpowers of far end signal u(n). Further during process initialization, αis set to zero point nine (0.9), and the expectation of the errorcorrelation matrix (EECM(n)) is set to zero (0) as follows:

${R_{E}(0)} = \begin{bmatrix}0 & 0 & \ldots & 0 \\0 & 0 & \ldots & 0 \\\vdots & \vdots & \ddots & \vdots \\0 & 0 & \ldots & 0\end{bmatrix}_{{MP}_{s} \times {MP}_{s}}$

The processor or special purpose controller computes the inverse of thefar end correlation matrix (IFECM(n)) in step 405. In step 410, EECM(n)is updated based on Equation (8) and FECM(n) is updated based onEquation (6) above. The calculation of FECM(n) involves the computationof non-zero lags of the Autocorrelation Function (hereinafter “ACF”) ofthe far end signal.

The process moves to step 415 wherein the processor or special purposecontroller computes variables. The variables computed are numerator,denominator and MSTE(n). The numerator is calculated asTR(FECM(n)×IFECM(n)). The denominator is calculated asTR(EECM(n)×IFECM(n)).

Thereafter, the processor or special purpose controller calculates theoptimum step size μ in step 420. The optimum μ is calculated usingEquation (17) below and D² is calculated using Equation (3) above. Thevalue is clipped between zero (0) and two (2).

$\begin{matrix}{{\mu_{opt}(n)} = {\frac{1}{L}{D^{2}(n)}\frac{numerator}{denominator}}} & {{EQN}\mspace{14mu} (17)}\end{matrix}$

The final form of the optimum μ can be expressed as:

$\begin{matrix}{{\mu_{opt}(n)} = \frac{{{MSTE}(n)} \cdot {{Tr}\left( {{{FECM}(n)} \times {{IFECM}(n)}} \right)}}{L \cdot {{Tr}\left( {{{EECM}(n)} \times {{IFECM}(n)}} \right)}}} & {{EQN}\mspace{14mu} (20)}\end{matrix}$

In yet another embodiment, illustrated in FIG. 5, an process utilizing amethod of Non-Parametric Variable Step Size is utilized for echocancellation. The process for determining μ, for insertion at step 320,starts at step 500. A process initialization occurs at time instant n=0.During process initialization, δ_(init) is set to twenty (20) times thesum of subband powers of far end signal u(n). Further during processinitialization, α is set to zero point nine (0.9), and ecorre is set tozero (0).

The processor or special purpose controller updates EEP(n) in step 505.In step 510, the processor or special purpose controller computes thefar end correlation matrix (FECM(n)). Then, in step 515, the processoror special purpose controller computes MSTE(n) using Equation (3) above.Additionally, the processor or special purpose controller computesBNP(n) using Equation (11) above.

Thereafter, the processor or special purpose controller calculates theoptimum step size μ in step 520. The optimum μ is calculated usingEquation (21) below where ψ is a small positive constant (e.g. 0.0001)to prevent division by zero (0). The optimum μ is clipped between zero(0) and two (2).

$\begin{matrix}{{\mu_{opt}(n)} = \left\{ \begin{matrix}{1 - \sqrt{\frac{{BNP}(n)}{\psi + {{EEP}(n)}}}} & {{{EEP}(n)} > {\hat{\sigma}}^{2}} \\0 & {{{EEP}(n)} < {\hat{\sigma}}^{2}}\end{matrix} \right.} & {{EQN}\mspace{14mu} (21)}\end{matrix}$

In still another embodiment, illustrated in FIG. 6, an process utilizinga method of Variable Regularization Factor is utilized for echocancellation. The process for determining δ, for insertion at step 325,starts at step 600. A process initialization occurs at time instant n=0.During process initialization, μ is set to one (1), α is set to zeropoint nine (0.9), and ecorre is set to zero (0).

The processor or special purpose controller computes BNP(n) in step 605.Thereafter, the processor or special purpose controller calculates theoptimum regularization factor δ in step 610. The optimum regularizationfactor (δ_(opt)(n) or BETA(n)) is given by Equation (22) below.

$\begin{matrix}{{\delta_{opt}(n)} = \frac{{{BNP}(n)} \cdot L}{P \times {{MSTE}(n)}}} & {{EQN}\mspace{14mu} (22)}\end{matrix}$

Referring now to FIG. 7, a simple block diagram illustrating theoperation of a Divergence Detector 164 according to embodiments of thepresent disclosure is depicted. The Divergence Detector 164 isconfigured to detect whether or not the AF 152 is diverging. Forexample, if the echo path changes drastically, the AF 152 may diverge.If the AF 152 diverges, the Divergence Detector 164 is configured todetect the divergence and reset the AF 152. The Divergence Detector 164resets the AF 152 to an initial AF setting such that a quickreconvergence of the AF 152 to new system conditions can occur.

The process commences at step 700 where either the AEC 150 turns-on orwhen the AF 152 has been reset. In step 705, the Divergence Detector 164waits for a period of DDinitial seconds to elapse, e.g. four seconds. Itwould be understood that illustration of a DDintial seconds equalingfour seconds is exemplary and many other time durations can beestablished for the DDinitial seconds period. After DDinitial secondselapses, the Divergence Detector 164 is configured to compute a smoothversion of MSTE(n).

The process moves to step 710 wherein a smooth version of MSTE(n) iscomputed. The smooth version of MSTE(n) is referenced as MSTE_SMOOTH(n).If MSTE_SMOOTH(n) monotonically increases over a MONO_INCREASE samples,the Divergence Detector 164 declares a divergence and resets the AF 152.In order to determine if MSTE_SMOOTH(n) is monotonically increasing overMONO_INCREASE samples, the Divergence Detector 164 computesMSTE_SMOOTH(n) using Equation (23) below and stores MSTE_SMOOTH(n) to amemory (not illustrated):

MSTE_SMOOTH(n)=0.999*MSTE_SMOOTH(n−1)+0.001*MSTE(n)  EQN(23)

In step 715 the Divergence Detector 164 determines if MSTE_SMOOTH(n) isincreasing over time. The Divergence Detector 164 comparesMSTE_SMOOTH(n) with MSTE_SMOOTH(n−1). IfMSTE_SMOOTH(n)>MSTE_SMOOTH(n−1), then the process moves to step 720. IfMSTE_SMOOTH(n)<MSTE_SMOOTH(n−1), then the process proceeds to step 725.

In step 720, a count is incremented by one such that count=count+1. Instep 725, the count is maintained such that count=0. Thereafter, theprocess proceeds to step 730.

In step 730, the Divergence Detector 164 compares the count to apredetermined value, MONO_INCREASE. If the count is greater thanMONO_INCREASE (i.e., count>MONO_INCREASE), the process moves to step 735wherein the Divergence Detector 164 has determined that a divergence hasbeen detected. In step 735, the Divergence Detector 164 resets the AF152 and returns to step 700. However, if count<MONO_INCREASE, theprocess moves to step 740 wherein the time period is incremented suchthat n=n+1 and the process returns to step 710 to compute the nextMSTE_SMOOTH(n).

Referring now to FIG. 8, a simple block diagram illustrating theoperation of a Double Talk Detector (DTD) 156 according to embodimentsof the present disclosure is depicted. As stated herein above, the DTD156 is configured to output a “1” in response to the DTD 156 detectingDT. If DT is not detected, the DTD 156 outputs a “0”. The output of theDTD 156 is input into MUX 158. If the MUX 158 receives a “1” from theDTD 156, the MUX 158 passes the AEC output 160 to the far-end. If theMUX 158 receives a “0” from the DTD 156, the MUX 158 masks the AECoutput 160 and passes a comfort noise 162 to the far-end. As such, aresidual echo, which remained in the AEC output 160 because it was notcancelled by the AF 152, is not present in the signal 150 transmitted tothe far end.

The process commences at step 800 wherein the AEC 150 is turned-on. Forthe purposes of illustration, an exemplary embodiment using a delaycoefficients based method of VSS, described in further detail withreference to FIG. 6 above, is utilized.

In step 810, the DTD 156 receives, as an input from the VSS 154, theregularization factor from a delay coefficients based variableregularization method of the VSS 154. The regularization factor isdenoted in FIG. 2 as BETA=β(n) 170. The DTD 156 also receives, as aninput, EESP(n) and NESPE(n).

In step 815, the DTD 156 computes a smoothed version of BETA(n). Thesmoothed version of BETA(n) is referenced as BETA_SMOOTH(n).BETA_SMOOTH(n) is calculated using Equation (24) below.

BETA_SMOOTH(n)=0.9*BETA_SMOOTH(n−1)+0.1*BETA(n)  EQN(24)

The process moves to step 820 wherein another quantity, BETA_DIFF, iscomputed using Equation (25) below.

BETA_DIFF(n)=BETA_SMOOTH(n)−BETA_SMOOTH(n−1)  EQN(25)

In step 825, the DTD 156 computes a ratio, CCDTD. The ratio, CCDTD, iscomputed using Equation (26) below.

CCDTD(n)=EESP(n)/NESPE(n)  EQN(26)

Then, the DTD 156 processes Equations (24), (25) and (26) in step 830.

The DTD 156 processes a pseudo-code for using the two conditions. Then,the DTD 156 evaluates two conditions in step 840. More specifically, theDTD 156 evaluations a CONDITION1 and CONDITION2. DT is detected if bothconditions are satisfied. If both conditions are not satisfied, DT isabsent. CONDITION1 is defined as BETA_DIFF(n)>BETA_THRESHOLD. CONDITION2is defined as CCDTD(n)<CCDTD_THRESHOLD.

In step 845, the DTD 156 determines if CONDITION1 and CONDITION2 aresatisfied. If both are satisfied, the process moves to step 850 wherethe count is set such that COUNT=HOVER+1. Then, the process moves tostep 855 where value of COUNT is checked. In step 845, if one ofCONDITION 1 and CONDITION 2, or both, are not satisfied (e.g, theconditions fail), the process moves to step 855. Then, in step 855, theDTD 156 determines if the count is greater than zero (0),i.e. count>0.If count>0, then the process moves to step 860. In step 860, the countis reduced by one such that count=count-1, the value s(n) is set tos(n)=S_(REF) (i.e., no tap update performed) and a value DTD_DEC is setto DTD_DEC=1. If the count is less than zero (0), i.e. count<0, in step855, then the process moves to step 865 wherein DTD_DEC=0 and a VSS tapupdate is performed.

It should be noted that, at any instant, the process is either in step865 or step 860. After steps 860 and 865, the process goes back to step805, i.e., the process goes back to the next time instant and starts anew set of processing.

The tap s(n) is stored in a memory (not illustrated) Once in everyONCE_IN samples, the tap s(n) is stored in memory. The latest stored tapin memory which is MONO_INCREASE time samples before a divergencedetection event, if any, is always stored into SREF. The value ofONCE IN could be set to be within the range of 2000-8000 samples.

Referring now to FIG. 9, a graph 900 of an echo and a cancelled echo fora constant step size is illustrated. The graph 900 illustrates echocancellation using an process wherein a step size control logic is notutilized. As such, the step size is fixed (e.g. μ=1).

The echo f 902 is measured over several samples 1004. Additionally, theestimated echo f′ not illustrated) is measured over several time samples904. A residual echo 906 is obtained by cancelling the estimated echo f′from the echo 902 (i.e., f−f′). As illustrated in the graph 900, theresidual echo 906 is huge and dominant. A huge and dominant residualecho 906 indicates that the adaptive filter 110 is diverging.

Referring now to FIG. 10, a graph 1000 of an echo and a cancelled echousing step size control logic is illustrated. The graph 1000 illustratesecho cancellation using an process wherein control logic determines atleast one of the step size (e.g. determine optimum μ) and theregularization factor (e.g. determine optimum δ) according toembodiments of the present disclosure.

The echo f 1002 is measured over several samples 1004. Additionally, theresidual echo (f−f′) 1006 is measured over several time samples 1004. Asillustrated in the graph 1000, the residual echo 1006 is very small andminimal.

Referring now to FIG. 11, a graph 1100 of the mean square tap error isplotted. The graph 1100 illustrates plots for processes according toembodiments of the present disclosure. A first plot 1102 illustratesperformance of a prior art process. A second plot 1104 illustratesperformance the first delay coefficients process. A third plot 1106illustrates performance the second delay coefficients process. A fourthplot 1108 illustrates performance of the non parametric variable stepsize process. A fifth plot 1110 illustrates performance of the variableregularization factor process. A sixth plot 1112 illustrates performanceof the delay coefficients process with a constant step size μ=1.

The graph 1100 illustrates that for the first through fifth plots1102-1110, the mean square tap error decreases with time. The decreaseof the mean square tap error over time indicates that the adaptivefilter 110 does not diverge. However, the graph 1100 illustrates thatfor the sixth plot 1112, the mean square tap error over time increases.As such, when using a constant step size μ=1 and no step size controllogic, the adaptive filter 110 diverges.

While this disclosure has described certain embodiments and generallyassociated methods, alterations and permutations of these embodimentsand methods will be apparent to those skilled in the art. Accordingly,the above description of example embodiments does not define orconstrain this disclosure. Other changes, substitutions, and alterationsare also possible without departing from the spirit and scope of thisdisclosure, as defined by the following claims.

1. A device for Acoustic Echo Cancelling in communications equipment, the Acoustic Echo Cancelling device comprising: a near end interface; a far end interface; an adaptive filter coupled between said near end interface and said far end interface; and a variable step size controller coupled to said adaptive filter, said variable step size controller comprising a processor, a computer readable medium, and a plurality of instructions wherein at least a portion of said plurality of instructions are storable in said computer readable medium, and further wherein said plurality of instructions are configured to cause said processor to determine at least one of a step size and a regularization factor, wherein said at least one of said step size and said regularization factor is transmitted to said adaptive filter to provide an estimated echo signal.
 2. The device of claim 1, said device further comprising a divergence detector coupled to said adaptive filter, said divergence configured to detect a divergence of said adaptive filter.
 3. The device of claim 2, wherein said divergence detector is configured to reset said adaptive filter in response to detecting said divergence of said adaptive filter.
 4. The device of claim 1, said device further comprising a double talk detector coupled to said variable step size controller, said double talk detector configured to detect an occurrence of a double talk and transmit an output signal to indicating said occurrence of said double talk.
 5. The device of claim 4, said device further comprising a multiplexor coupled between said near end interface and said far end interface, said multiplexor adapted to mask an AEC output signal.
 6. The device of claim 5, wherein said output signal from said double talk detector is a DT detected signal transmitted to an input of said multiplexor and wherein said multiplexor masks said AEC output signal in response to receiving said DT detected signal.
 7. The device of claim 1 wherein said processor determines the step size by performing the steps of: computing an inverse far end correlation matrix; updating an expectation of error correlation matrix; updating a far end correlation matrix; computing a plurality of variables, the plurality of variables comprising a numerator, denominator and a mean-square tap error (MSTE); and calculating said step size based on a product of said MSTE, said numerator and said denominator.
 8. The device of claim 1 wherein said processor determines said step size by performing the steps of: updating an expectation of error power; computing a far end correlation matrix and a background noise power; computing a mean-square tap error (MSTE); and calculating said step size based on said MSTE, said expectation of error power and said background noise power.
 9. The device of claim 1 wherein said processor determines said regularization factor by performing the steps of: computing background noise power; and calculating said regularization factor based on said background noise power.
 10. A method for acoustic echo cancelling in communications equipment, the method comprising: estimating an echo from a received signal using a subband affine projection filter, wherein the step of estimating further comprises determining at least one of a step size and a regularization factor; and cancelling the estimated echo from a near-end signal to produce an AEC output signal.
 11. The method of claim 10, the method further comprising detecting a divergence of the subband affine projection filter.
 12. The method of claim 11, the method further comprising resetting the subband affine projection filter in response to detecting the divergence of the subband affine projection filter.
 13. The method of claim 10, the method further comprising detecting a double talk in the near-end signal.
 14. The method of claim 13, the method further comprising masking the AEC output signal in response to detecting the double talk in the near-end signal.
 15. The method of claim 10, wherein the step of determining at least one of the step size and the regularization factor comprises: determining an optimal step size; and setting the regularization factor to a fixed value.
 16. The method of claim 15 wherein determining the optimal step size comprises: computing an inverse far end correlation matrix; updating an expectation of error correlation matrix; updating a far end correlation matrix; computing a plurality of variables, the plurality of variables comprising a numerator, denominator and a mean-square tap error (MSTE); and calculating said step size based on a product of said MSTE, said numerator and said denominator.
 17. The method of claim 15, wherein determining the optimal step size comprises: updating an expectation of error power; computing a far end correlation matrix and a background noise power; computing a mean-square tap error (MSTE); and calculating said step size based on said MSTE, said expectation of error power and said background noise power.
 18. The method of claim 10, wherein the step of determining at least one of the step size and the regularization factor comprises: determining an optimal regularization factor; and setting the step size to a fixed value.
 19. The method of claim 18, wherein determining the optimal regularization factor comprises: computing background noise power; and calculating said regularization factor based on said background noise power.
 20. A system for acoustic echo cancellation, said system comprising: a near-end interface, said near-end interface further comprising a microphone and a speaker; a far-end interface; an acoustic cancellation device coupled between said near-end interface and said far-end interface; wherein said acoustic cancellation device is configured to estimate a echo using a subband affine projection process wherein at least one of said step size and said regularization factor is determined by a variable step size controller. 